{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "41a6c181328bf5f4",
   "metadata": {},
   "source": [
    "# Tablestore Demo\n",
    "\n",
    "This guide shows you how to directly use our `DocumentStore` abstraction backed by Tablestore. By putting nodes in the docstore, this allows you to define multiple indices over the same underlying docstore, instead of duplicating data across indices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cbc6f245a19ece25",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-storage-docstore-tablestore\n",
    "%pip install llama-index-storage-index-store-tablestore\n",
    "%pip install llama-index-vector-stores-tablestore\n",
    "\n",
    "%pip install llama-index-llms-dashscope\n",
    "%pip install llama-index-embeddings-dashscope\n",
    "\n",
    "%pip install llama-index\n",
    "%pip install matplotlib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a54d1c43-4b7f-4917-939f-a964f6f3dafc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa67fa07-1395-4aab-a356-72bdb302f6b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import sys\n",
    "\n",
    "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
    "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d12d766-3ca8-4012-9da2-248be80bb6ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader, StorageContext\n",
    "from llama_index.core import VectorStoreIndex, SimpleKeywordTableIndex\n",
    "from llama_index.core import SummaryIndex\n",
    "from llama_index.core.response.notebook_utils import display_response\n",
    "from llama_index.core import Settings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14b4fc922fc82db1",
   "metadata": {},
   "source": [
    "#### Config Tablestore\n",
    "Next, we use tablestore's docsstore to perform a demo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d118ec262452d881",
   "metadata": {},
   "outputs": [],
   "source": [
    "import getpass\n",
    "import os\n",
    "\n",
    "os.environ[\"tablestore_end_point\"] = getpass.getpass(\"tablestore end_point:\")\n",
    "os.environ[\"tablestore_instance_name\"] = getpass.getpass(\n",
    "    \"tablestore instance_name:\"\n",
    ")\n",
    "os.environ[\"tablestore_access_key_id\"] = getpass.getpass(\n",
    "    \"tablestore access_key_id:\"\n",
    ")\n",
    "os.environ[\"tablestore_access_key_secret\"] = getpass.getpass(\n",
    "    \"tablestore access_key_secret:\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9d986db911e80c5",
   "metadata": {},
   "source": [
    "#### Config DashScope LLM\n",
    "Next, we use dashscope's llm to perform a demo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa93fff2bdc123c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import getpass\n",
    "\n",
    "os.environ[\"DASHSCOPE_API_KEY\"] = getpass.getpass(\"DashScope api key:\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d950023",
   "metadata": {},
   "source": [
    "#### Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b582ec16",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f6dd9d5f-a601-4097-894e-fe98a0c35a5b",
   "metadata": {},
   "source": [
    "#### Load Documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7cdaf9d-cfbd-4ced-8d4e-6eef8508224d",
   "metadata": {},
   "outputs": [],
   "source": [
    "reader = SimpleDirectoryReader(\"./data/paul_graham/\")\n",
    "documents = reader.load_data()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "bae82b55-5c9f-432a-9e06-1fccb6f9fc7f",
   "metadata": {},
   "source": [
    "#### Parse into Nodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f97e558a-c29f-44ec-ab33-1f481da1a6ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "\n",
    "nodes = SentenceSplitter().get_nodes_from_documents(documents)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "aff4c8e1-b2ba-4ea6-a8df-978c2788fedc",
   "metadata": {},
   "source": [
    "#### Init Store/Embedding/LLM/StorageContext"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ba8b0da-67a8-4653-8cdb-09e39583a2d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.storage.docstore.tablestore import TablestoreDocumentStore\n",
    "from llama_index.storage.index_store.tablestore import TablestoreIndexStore\n",
    "from llama_index.vector_stores.tablestore import TablestoreVectorStore\n",
    "from llama_index.embeddings.dashscope import (\n",
    "    DashScopeEmbedding,\n",
    "    DashScopeTextEmbeddingModels,\n",
    "    DashScopeTextEmbeddingType,\n",
    ")\n",
    "from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels\n",
    "\n",
    "embedder = DashScopeEmbedding(\n",
    "    model_name=DashScopeTextEmbeddingModels.TEXT_EMBEDDING_V3,  # default demiension is 1024\n",
    "    text_type=DashScopeTextEmbeddingType.TEXT_TYPE_DOCUMENT,\n",
    ")\n",
    "\n",
    "dashscope_llm = DashScope(\n",
    "    model_name=DashScopeGenerationModels.QWEN_MAX,\n",
    "    api_key=os.environ[\"DASHSCOPE_API_KEY\"],\n",
    ")\n",
    "Settings.llm = dashscope_llm\n",
    "\n",
    "docstore = TablestoreDocumentStore.from_config(\n",
    "    endpoint=os.getenv(\"tablestore_end_point\"),\n",
    "    instance_name=os.getenv(\"tablestore_instance_name\"),\n",
    "    access_key_id=os.getenv(\"tablestore_access_key_id\"),\n",
    "    access_key_secret=os.getenv(\"tablestore_access_key_secret\"),\n",
    ")\n",
    "\n",
    "index_store = TablestoreIndexStore.from_config(\n",
    "    endpoint=os.getenv(\"tablestore_end_point\"),\n",
    "    instance_name=os.getenv(\"tablestore_instance_name\"),\n",
    "    access_key_id=os.getenv(\"tablestore_access_key_id\"),\n",
    "    access_key_secret=os.getenv(\"tablestore_access_key_secret\"),\n",
    ")\n",
    "\n",
    "vector_store = TablestoreVectorStore(\n",
    "    endpoint=os.getenv(\"tablestore_end_point\"),\n",
    "    instance_name=os.getenv(\"tablestore_instance_name\"),\n",
    "    access_key_id=os.getenv(\"tablestore_access_key_id\"),\n",
    "    access_key_secret=os.getenv(\"tablestore_access_key_secret\"),\n",
    "    vector_dimension=1024,  # embedder dimension is 1024\n",
    ")\n",
    "vector_store.create_table_if_not_exist()\n",
    "vector_store.create_search_index_if_not_exist()\n",
    "\n",
    "storage_context = StorageContext.from_defaults(\n",
    "    docstore=docstore, index_store=index_store, vector_store=vector_store\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "853dd1abee90d2f",
   "metadata": {},
   "source": [
    "#### Add to docStore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e88378b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "storage_context.docstore.add_documents(nodes)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "528149c1-5bde-4eba-b75a-e8fa1da17d7c",
   "metadata": {},
   "source": [
    "#### Define & Add Multiple Indexes\n",
    "\n",
    "Each index uses the same underlying Node."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "316fb6ac-2031-4d17-9999-ffdb827f46d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# https://gpt-index.readthedocs.io/en/latest/api_reference/indices/list.html\n",
    "summary_index = SummaryIndex(nodes, storage_context=storage_context)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c6b2141-fc77-4dec-891b-d4dad0633b35",
   "metadata": {},
   "outputs": [],
   "source": [
    "# https://gpt-index.readthedocs.io/en/latest/api_reference/indices/vector_store.html\n",
    "vector_index = VectorStoreIndex(\n",
    "    nodes,\n",
    "    insert_batch_size=20,\n",
    "    embed_model=embedder,\n",
    "    storage_context=storage_context,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "144bc7eb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# https://gpt-index.readthedocs.io/en/latest/api_reference/indices/table.html\n",
    "keyword_table_index = SimpleKeywordTableIndex(\n",
    "    nodes=nodes,\n",
    "    storage_context=storage_context,\n",
    "    llm=dashscope_llm,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ccbe86c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "44"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NOTE: the docstore still has the same nodes\n",
    "len(storage_context.docstore.docs)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1059ec3c",
   "metadata": {},
   "source": [
    "#### Test out saving and loading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c25576c8b039e74d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# NOTE: docstore and index_store is persisted in Tablestore by default\n",
    "# NOTE: here only need to persist simple vector store to disk\n",
    "storage_context.persist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9155c1a9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "c05fec2a-ac87-4761-beeb-0901f9e6530e d0b021ed-3427-46ad-927d-12d72752dbc4 2e9bfc3a-5e69-408a-9430-7b0c8baf3d77\n"
     ]
    }
   ],
   "source": [
    "# note down index IDs\n",
    "list_id = summary_index.index_id\n",
    "vector_id = vector_index.index_id\n",
    "keyword_id = keyword_table_index.index_id\n",
    "print(list_id, vector_id, keyword_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "555de7fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import load_index_from_storage\n",
    "\n",
    "# re-create storage context\n",
    "storage_context = StorageContext.from_defaults(\n",
    "    docstore=docstore, index_store=index_store, vector_store=vector_store\n",
    ")\n",
    "\n",
    "summary_index = load_index_from_storage(\n",
    "    storage_context=storage_context,\n",
    "    index_id=list_id,\n",
    ")\n",
    "keyword_table_index = load_index_from_storage(\n",
    "    llm=dashscope_llm,\n",
    "    storage_context=storage_context,\n",
    "    index_id=keyword_id,\n",
    ")\n",
    "# You need to add \"vector_store=xxx\" to StorageContext to load vector index from Tablestore\n",
    "vector_index = load_index_from_storage(\n",
    "    insert_batch_size=20,\n",
    "    embed_model=embedder,\n",
    "    storage_context=storage_context,\n",
    "    index_id=vector_id,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c5bc40a7",
   "metadata": {},
   "source": [
    "#### Test out some Queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8db82de3",
   "metadata": {},
   "outputs": [],
   "source": [
    "Settings.llm = dashscope_llm\n",
    "Settings.chunk_size = 1024"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "244bc6ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = summary_index.as_query_engine()\n",
    "list_response = query_engine.query(\"What is a summary of this document?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6cbe77ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_response(list_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02b800ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = vector_index.as_query_engine()\n",
    "vector_response = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70b63767",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**`Final Response:`** Growing up, the author was involved in writing and programming outside of school. Initially, they wrote short stories, which they now consider to be not very good, as they lacked much plot and focused more on characters' emotions. In terms of programming, the author started with an IBM 1401 at their junior high school, where they attempted to write basic programs in Fortran using punch cards. Later, after getting a TRS-80 microcomputer, the author delved deeper into programming, creating simple games, a program to predict the flight height of model rockets, and even a word processor that their father used for writing."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display_response(vector_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b93478b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = keyword_table_index.as_query_engine()\n",
    "keyword_response = query_engine.query(\n",
    "    \"What did the author do after his time at YC?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8044da9c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**`Final Response:`** After his time at YC, the author decided to take up painting, dedicating himself to it to see how good he could become. He spent most of 2014 focused on this. However, by November, he lost interest and stopped. Following this, he returned to writing essays and even ventured into topics beyond startups. In March 2015, he also began working on Lisp again."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display_response(keyword_response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
